Graphlet Data Mining of Energetical Interaction Patterns in Protein 3D Structures
نویسندگان
چکیده
Interactions between secondary structure elements (SSEs) in the core of proteins are evolutionary conserved and define the overall fold of proteins. They can thus be used to classify protein families. Using a graph representation of SSE interactions and data mining techniques we identify overrepresented graphlets that can be used for protein classification. We find, in total, 627 significant graphlets within the ICGEB Protein Benchmark database (SCOP40mini) and the Super-Secondary Structure database (SSSDB). Based on graphlets, decision trees are able to predict the four SCOP levels and SSSDB (sub)motif classes with a mean Area Under Curve (AUC) better than 0.89 (5-fold CV). Regularized decision trees reveal that for each classification task about 20 graphlets suffice for reliable predictions. Graphlets composed of five secondary structure interactions are most informative. Finally, we find that graphlets can be predicted from secondary structure using decision trees (5-fold CV) with a Matthews Correlation Coefficient (MCC) reaching up to 0.7.
منابع مشابه
Network approach integrates 3D structural and sequence data to improve protein classification
Motivation: Early approaches for protein (structural) classification were sequence-based. Since amino acids that are distant in the sequence can be close in the 3-dimensional (3D) structure, 3D contact approaches can complement sequence approaches. Traditional 3D contact approaches study 3D structures directly. Instead, 3D structures can first be modeled as protein structure networks (PSNs). Th...
متن کاملA 3D Finite-Difference Analysis of Interaction between a Newly-Driven Large Tunnel with Twin Tunnels in Urban Areas
Evaluation of the interaction between a new and the existing underground structures is one of the important problems in urban tunneling. In this work, using FLAC3D, four numerical models of single- and twin-tube tunnels in urban areas are developed, where the horizontal distance between the single- and twin-tube tunnels are varied. The aim is to analyze the effects of the horizontal dista...
متن کاملDimensionality analysis of subsurface structures in magnetotellurics using different methods (a case study: oil field in Southwest of Iran)
Magnetotelluric (MT) method is an electromagnetic technique that uses the earth natural field to map the electrical resistivity changes in subsurface structures. Because of the high penetration depth of the electromagnetic fields in this method (tens of meters to tens of kilometers), the MT data is used to investigate the shallow to deep subsurface geoelectrical structures and their dimensions....
متن کاملGraphlet Kernels for Prediction of Functional Residues in Protein Structures
We introduce a novel graph-based kernel method for annotating functional residues in protein structures. A structure is first modeled as a protein contact graph, where nodes correspond to residues and edges connect spatially neighboring residues. Each vertex in the graph is then represented as a vector of counts of labeled non-isomorphic subgraphs (graphlets), centered on the vertex of interest...
متن کامل3D gravity data-space inversion with sparseness and bound constraints
One of the most remarkable basis of the gravity data inversion is the recognition of sharp boundaries between an ore body and its host rocks during the interpretation step. Therefore, in this work, it is attempted to develop an inversion approach to determine a 3D density distribution that produces a given gravity anomaly. The subsurface model consists of a 3D rectangular prisms of known sizes ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2010